-
-
Notifications
You must be signed in to change notification settings - Fork 4
Manage batch processing in Debezium Extensions for Quarkus #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
…yncEngineBuilderFactory Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
…events Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
Signed-off-by: kmos <[email protected]>
| <dependency> | ||
| <groupId>org.apache.kafka</groupId> | ||
| <artifactId>connect-api</artifactId> | ||
| <scope>provided</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quarkus-debezium-engine-spi doesn't work without the engine or a connector that contains the connect-api but I can change the scope
|
|
||
| import org.apache.kafka.connect.source.SourceRecord; | ||
|
|
||
| public interface BatchEvent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to operate at this level? Can't we extend CapturingEvent class to add commit or similar methods.
| public interface BatchEvent { | |
| public interface ComittingCapturingEvent extends CapturingEvent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the name. In any case this class contains the record as SourceRecord and the serialized. In some case, I have noticed that inside sinks are used both (SourceRecord and serialized). I didn't used CapturingEvent because I used the concept for entire list of messages
| private final AtomicInteger isCapturingFilteredEvent = new AtomicInteger(0); | ||
|
|
||
| @Capturing(destination = "topic.inventory.products") | ||
| public void capture(CapturingEvents<BatchEvent> events) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we take a bit different approach so the user will use somehing like
| public void capture(CapturingEvents<BatchEvent> events) { | |
| public void capture(List<CommittedCapturingEvent> events) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used an approach with CapturingEvents<?> because the events are flowing with some common details:
- engine (in a multi-engine configuration for example)
- source
- destination
all the events should have the same engine, source and destination which means having these information repeated in all the events
| private final Logger logger = LoggerFactory.getLogger(GeneralChangeConsumer.class); | ||
|
|
||
| @Override | ||
| public void handleBatch(List<ChangeEvent<Object, Object>> records, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try to improve the readability of the method?
| /*** | ||
| * @return engine for which the events are emitted | ||
| */ | ||
| String engine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think have sense to add here a void commitBatch method instead to automatically call it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean markBatchFinished()? Yeah, I was thinking about that. All the actual debezium-server-sink use the markBatchFinished() as last statement without taking any action on it so I postponed the development of the API. In any case I'll prefer to find a different name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see the differences:
- auto commit the batch: In that case if there is an error processing a particular record of the batch, the batch will be commited and so the processed record will not be re-processed if a restart occurs.
- explicit batch commit: if an error occurs the user could decide to still committ the batch, in that case we have the same behavior, but an user could decide to not commit the batch to reprocess the whole batch.
Honestly dunno if reprocessing the batch is something really used. So maybe we can leave the auto commit?
|
Overall LGTM, I just left some minor comment |
closes debezium/dbz#1484